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ABSTRACT 

We present an extended morphometric system to automatically classify galaxies from astronomical 
images. The new system includes the original and modified versions of the CASGM coefficients 
(Concentration Ci, Asymmetry A3, and Smoothness S'3), and the new parameters entropy, H, and 
spirality a^. The new parameters A3, S'3 and H are better to discriminate galaxy classes than Ai, 
Si and G, respectively. The new parameter captures the amount of non-radial pattern on the 
image and is almost linearly dependent on T-type. Using a sample of spiral and elliptical galaxies 
from the Galaxy Zoo project as a training set, we employe d the Linear Discriminant Analysis (LDA) 


technique to classify Baillard et al. (2011 
and SDSS Legacy (779,235 galaxies) samp. 


4478 galaxies), Nair & Abraham (2010 14123 galaxies) 
es. The cross-validation test shows that we can achieve an 


accuracy of more than 90% with our classification scheme. Therefore, we are able to define a plane 
in the morphometric parameter space that separates the elliptical and spiral classes with a mismatch 
between classes smaller than 10%. We use the distance to this plane as a morphometric index (Mj) 
and we show that it follows the human based T-type index very closely. We calculate morphometric 
index Mi for ^780k galaxies from SDSS Legacy Survey - DR7. We discuss how Mi correlates with 
stellar population parameters obtained using the spectra available from SDSS-DR7. 

Subject headings: galaxies: morphology, morphometry, classification; methods: statistical 


1. INTRODUCTION 

The study of the formation and evolution of galax¬ 
ies in general requires their systematic observations over 
a large redshift domain. Datasets for local (today's 
systems) and distant galaxies (their actual progenitors) 
must be consistently gathered to avoid biases, a pro¬ 
cedure that requires a knowledge of the very evolution 
we are seeking to understand. From an astrophysical 
perspective, mechanisms regulatin g star formati on, e.g. 


ram-pressu re (Gunn & Gott 1972), hyassm ent (Moore, 
et al. 1996), starvation ([Larson et ai. 1980), depend on 


the distance from the center of the potential well of a 
cluster; their effects on the stellar population of a galaxy 
depend on how efficiently the local environment is capa¬ 
ble of removing the interstellar gas and affect the star 
formation history. Thus, morphology, in a general sense, 
is just a snapshot reflecting all these processes imprinted 
in the galaxy image at a given time. 

Traditionally, galaxy morphology has been addressed 
visually: an expert examines the images of galaxies and 
identifies features (or absence of them, in the case of 


longing to a specific class, as done in 

Hubble (1192^; 

de Vaucouleurs |(|I959|); Sandage ( 

1975 

; van den Bergh 

(1976P; Lintott et al.| ( 

20081 

y), among many oth- 


ers. This classification paradigm is strongly subjective 
prone to errors and cannot be applied to the nu mber of 
galaxies pre sent in modern surveys For instance, IFasano 


et al. ( 2012) compare their classification with that of He 
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Vaucouleurs et al. (1991 RC3 catalog) (see their Fig- 
ure 2). As it is clearly seen there is an uncerta i nty of 
approximately 1 in T-type within Fasano et al. (2012|) 
author s and about 2.5 between R.C3 and Ih'asano et al.l 
2012). Comparison betw een EFIGI and NA2 010, for 
438 galaxies in common (Baillard et al. 2011 Figure 
32), exhibits a similar sort of inconsistency, with an un¬ 
certainty between 2 and 3 in T-type. This tell us that, 
for example, visual classification does not agree when dis¬ 
tinguishing an SO from an Sa or an E from an SO. Thus, 
it is imperative to quantify the morphology of a galaxy 
as a measurable quantity - morphometry - that can be 
coded in an algorithm. However, in spite of its uncer¬ 
tainties, visual classification is still important, because 
automated techniques would have difficulty doing classi¬ 
fications like (RiR2)SAB(r, nr)0/a which is much more 
valuable than just knowing whether a galaxy is S, SO, or 
E. Also, regarding RC3, it should be noted that its clas¬ 
sifications were made based on photographic plates or 
sky survey charts. Modern digital images allow greater 
consistency between multiple classifiers and have the po¬ 
tential to greatly improve on RC3. 

Two approaches for galaxy morphometry have been 
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widely explored recently: parametric - those which 
model the light distribution as a bulge plus disk plus 
other less important comp onents represent i ng a few per¬ 
cent o f the total light (e.g. Peng et al.|2002 Simard et al. 
20021 ; non-parametric - those which use the measured 


properties of the light distribution, li ke concentration, 
asymmetry (e.g. Abraham et al. 19961. Each appr oach 
has its virtues an d vices, as discussed tor example in An- 


drae et al. (2011). 

One relatively successful non-parametric system is the 
concentration, asymmetry, smo othness, Gini an d M20 
(CASGM) system, presente d in Abraham et al.| (1994 
1996|) , iConselice et al.| (|2000|) and Lotz et al.|((2004|). Ihis 
basic set has been enlarged w ith other qua ntities such as 


the Sersic mod el parameters (Sersic TT9^ , and the Pet¬ 


rosian radius (Petrosian 1975), among others. These 


quantities may not work properly in the high redshift 
regime and this has b een studied in recent papers (e.g. 


Freeman et al. 2013). These authors use Multi-mode, 
Intensity and Deviation statistics, MID, to detect distur¬ 
bances in the galaxy light distribution and show that it 
is very effective at z ^2. 

The previously mentioned way of establishing galaxy 
morphology answers two immediate needs. First, it is 
possible to reproduce human classification by position¬ 
ing the galaxies in the space of these parameters. In 
such a supervised classification, a set of visually classi¬ 
fied galaxies are used to train a discriminant function 
that will assign to each new galaxy a probability of be¬ 
longing to each class. The second reason for establishing 
a galaxy morphometry system is that we can seek struc¬ 
tures, in the quantitative morphology parameter space, 
that may yield clues for the physical reasons for their for¬ 
mation and evolution that are not visible in the currently 
human-based mode. Further, a system such as the Hub¬ 
ble tuning fork classihcation does not account for all the 
details that we can currently measure in galaxy images, 
and it does not hold as we go deep even at a moderate 
redshift 2 = 0.25. To evaluate this, a new quantita¬ 
tive classification procedure is needed, both to handle 
the large amount of data becoming available with the 
new surveys, and also to help us find the physical pro¬ 
cesses driving galaxy evolution. 

The paper is organized as follows: in Section]^ we dis¬ 
cuss similar works; in Section we describe the aatasets 
used; in Sectionj^we define new nonparametric methods 
to quantify galaxy morphology; in Section!^ we present 
the Morfometryka algorithm. We apply the MOR- 
FOMETRYKA code to galaxy samples described in Section 
where we also test the robustness of the measured 
parameters and explore the ability of them to classify 
galaxy morphologies. In Section [7|w e propose a new Mor¬ 
phometric Index Mi. In Section we compare M, with 
other physical parameters and summary is presented in 
Section |9l 

2 . RELATED WORK 



EFIGI 

NA 

legacy 

LEGACY-zr 

Total 

4 458 

14 034 

804 974 

337 097 

MFMTK 

4 214 

12 729 

779 235 

327 937 

MFMTK-I-Zoo 

1856 

8 792 

245 206 

125 417 


TABLE 1 

Number of objects in the databases used in this work. 

MFMTK ARE OBJECTS FOR WHICH MORFOMETRYKA MADE 
MEASUREMENTS; ZOO MEANS OBJECTS CLASSIFIED AS E OR S BY 

Galaxy Zoo. 


Huertas-Company et al. (2011) used a system based on 
colors, shapes and concentration to train a support vec¬ 
tor machine to classify ^700k galaxies from the SDSS 
DR7 spectroscopic sample. Eor each galaxy, they esti¬ 
mate the probabilities of being E, SO, Sab or Scd. It is 
not a pure morphometric classihcation, since it includes 
colors. 


Scarlata et al. 
ies with the ZES' 


(|2007|) analyses 56,000 COSMOS galax- 
i' algorithm, using hve nonparametric 
diagnostics {A, Ci, G, M 20 , q) and Sersic index n. They 
perform Principal Component Analysis (PCA) and clas¬ 
sify galaxies with 3 principal components. They hnd con¬ 
tamination between galaxy classes in parameters space 
(see their Eig.lO), although they do not state clearly the 
success rate of the classihcation. 


Andrae et al. (2011) present a detailed analysis of sev- 
eral critical issues when dealing with galaxy morphology 
and classihcation. Several morphological features are in¬ 
tertwined and cannot be estimated independently. They 
show the dependence between C and n, which is also 
presented here in a different form in Appendix The 
authors claim that parameter based approaches are bet¬ 
ter for classihcation, and state that a system such as 
CASGM has serious problems. However, they do not 
s how it in practice . 


Dieleman et al. (2015) present a Neural Network ma- 
chine to reproduce Galaxy Zoo classihcation. They work 
directly in pixel space, using a rotation invariant convolu¬ 
tion that minimizes sensitiveness to changes in scale, ro¬ 
tation, translation and sampling of the image. The algo¬ 
rithm obtains an accuracy of 99% relative to the Galaxy 
Zoo human classihcation; however, since the human clas¬ 
sihcation is also error prone, as discussed in Section 
their algorithm reproduces also the errors in the human 
classihcation. 


Ereeman et al. (2013) introduced MID (multimode, 


intensity and deviation] statistics designed to detect dis¬ 
turbed morphologies, and then classify 1639 galaxies ob¬ 
served with HST WEPC3 with a random forest. It is one 
of the few works that state the detailed classiher perfor¬ 
mance, in terms of the confusion matrix coefficients. 

3. DATA AND SAMPLE SELECTION 
We use several databases derived from SDSS DR7 


air & Abraham 


database 

There have been several atte mpts to classify galax- (hereafter NA); and the SDSS DR7 complete Legacy 


(Abaza.jian et al.|2009) , for which we analy ze r band 
images. 'I'hey are: t he [Baillard et al. ( 2011|) database 
(hereafter EEIGI); the ^ 


ies au tomatically, beginning with 


Abraham et al. (1994 


1996|), among others. Here we briehy mention recent 
works based on machine learning and on morphometric 
parameters. The list is not meant to be exhaustive but 
rather to present different approaches followed in the last 
few years. 


database and a volume limited subsample, hereafter re¬ 
ferred as LEGACY and LEGACY-zr, respectively. We 
also use th e Galaxy Zoo collaborative project visual clas¬ 


sification (Lintott et al. 2008 2011). The number of 


galaxies in the original databases, those succesfully pro¬ 
cessed by Morfometryka and those that have a Galaxy 
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Zoo classification are listed in Table |3] 

The databases are used with different purposes, 
namely, training, validation and classification. In train¬ 
ing phase, the galaxies in the databases that have Galaxy 
Zoo classification are used to train a classifier machine 


(Section 6.2). For validation, we use a cross validation 
scheme to attest how well our classifier performed com¬ 
pared to Galaxy Zoo human classification. In the clas¬ 
sification stage, we use the trained classifier to linearly 
separate LEGACY galaxies in two classes (elliptical-E or 
spir al-S) in the morphometric parameters space (Section 


6.2). The galaxy distance to the separating hyperplane 
IS then proposed as a morphometric index M, (Section 
0. The databases EFIGI and NA, for which we have 
T-type values, are further used to support out argument 
that Mi, based on the classifier discriminant function, 
can reflect the galaxy morphological type. 

The classification schem e from the Galaxy Zoo project 
(Lintott et al. 2008 2011) was used to train our super- 
vised morphometric classifier. The Galaxy Zoo project 
provides simple morphological classifications of nearly 
900,000 galaxies drawn from the SDSS-DR6. 

Below we discuss each sample in detail. 

3.1. The EFIGI sample 

The EFIGI catalog was specifically designed to sam¬ 
ple all Hubble morphological types. It provides detailed 
morphological information of galaxies selected from stan¬ 
dard surveys and catalogues (Principal Galaxy Gata- 
logue, Sloan Digital Sky Survey, Value-Added Galaxy 
Gatalogue, HyperLeda, and the NASA Extragalactic 
Database). The sample is essentially limited in apparent 
diameter, and offers a detailed view of the whole Hubble 
sequence. The final EFIGI sample comprises 4458 galax¬ 
ies for which there is imaging in all the ugriz bands in the 
SDSS-DR4 database. For these galaxies, the EFIGI ref¬ 
erence dataset provides visually estimated morphological 
information as well as re-sampled SDSS imaging data. 
The photometric catalog is more than 80% complete 
for galaxies with 10 < TOpetro.g < 14, where mpetro.g is 
the Petrosian magnitude in the g band. 

3.2. The NA sample 


Nair & Abraham (2010) provide detailed visual classi- 
fications tor 14034 galaxies selected from the SDSS spec¬ 
troscopic main sample described in Strauss et al. (2002). 
They used the SDSS-DR4 photometry catalogs to se¬ 
lect all spectroscopically targeted galaxies in the redshift 
range 0.01 < 2 < 0.1 down to an apparent extinction- 
corrected magnitude limit of g < 16 mag. Objects mis¬ 
takenly classified as galaxies have been removed, leading 
to the final sample of 14,034 galaxies. Their final cata¬ 
log provides T-Types, the existence of bars, rings, lenses, 
tails, warps, dust lanes, arm fiocculence, and multiplicity 
for all galaxies. 

3.3. The SDSS LEGACY and LEGACY-zr samples 

Ou r target sample of galax ies was retrieved from SDSS- 
DR7 (jAbazajian et al.|2009|) by selecting all objec ts sp ec¬ 
troscopically classified as galaxies (see Appendix I A. 2 1 for 
full query). SDSS Frames and psFields were obtained 
and stamps and PSF (point spread function) generated 
from them (see Appendix [A] for details). Our final cata¬ 
log comprises 804,974 objects. 


The subsample LEGACY-zr is volume limited at red- 
shift z < 0.1 and mpetro.r < 17.78, where TOpetro,r is 
the extinction corrected Petrosian magnitude in the r 
band. This magnitute limit roughly corresponds to the 
magnitude at which the SDSS spectroscopy is complete 
( Strauss et al.|[2002 ). The redshift limit of z < 0.1 pro- 
vides a complete sample for Mpetro.r < —20.5, where 
Afpetro.r is the SDSS Petrosian absolute magnitude in 
the r band. 

For 570,685 galaxies, those for which zWarning=0 in 
the SDSS database, we derived ages, metallicities, stellar 
masses and veloci ty dispersions using the spectr al fitting 


code STARLIGHT (Cid Fernandes et al. 2005). Before 


running the code, the observed spectra are corrected for 
foreground extinction and de-redshifted, and the single 
stellar population (SSP) models are degraded to match 
the wavelength-depe ndent resolution of the SPSS spec¬ 
tra, as described in |La Barb era e t al. (2010). We 

] 1989) extinction iW, as- 


adopted the Gardelli et al. 
suming Ry = 3.1. 

We used SSP models based on the Medium resolution 
INT Library of Empir ical Spectra (MILES, 


Sanchez- 


Blazquez et al.| 2006), using the cod e presented m 


Vazdekis et ai.| (| 26 lOp, using version 9.1 (|Falc6n-Barroso| 
et al.||2011[ ). They have a spectral resolution of ~ 2.5 A, 
almost constan t with wavelen gth. We selected models 
computed with Kroupa (2 001) Universal IMF w ith slope 
= 1.30, and isochrones by Girardi et al. (2000). The ba¬ 
sis grids cover ages in the range of 0.07 — 14.2 Gyr, with 
constant log(Age) steps of 0.2. We selected SSPs with 
metallicities [M/H] ={-1.71,-0.71,-0.38,0.00,-b0.20}. 


4. QUANTITATIVE GALAXY MORPHOLOGY 
The basic morphometric measurements of the CASGM 


syste m are fully described by Abraham et al. J 1994 


19961) , iBershady et al.| (|2000|) , 
Lotz et al. (|2004[), among others. 


Gonselice et al. 1 2000) anc 


Relevant mo dificat ions 
ot these parameters and the new parameters introduced 
in this work are discussed in this section. 

We define the region with the sam e axi s ratio and posi¬ 
tion angle as the galaxy (see Section A.2) and with major 
axis equal Nn^Rp, where Rp is the Petrosian Radius and 
Arp = 2, as the Petrosian region. Most measurements 
are made with pixels in this regions, except if otherwise 
stated. A central region of the size of the PSF FWHM 
is masked before calculating A, S and cr,/,. 


4.1. Concentration Ci and C 2 

The concentration index C is the ratio of the circu¬ 
lar radi i containing two fractions of the total flux of the 
galaxy (IKent ||1985|), where these percentages are chosen 


to maximize the distinction between systems and min¬ 
imize seeing effects. The concentration depends on the 
determination of the radius that contain some fraction of 
some measure of the total luminosity of the galaxy. In 
this work, we have adopted the Petrosian luminosity as 
the total luminosity Lp, which is the maximum value of 
L{R) inside the Petrosian region. The measured L{R) is 
spline interpolated and then the point where it attains 
some fraction / of Lt is found by evaluating the spline 
at the point. In this way we obtain R 20 , R 507 R&o and 
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i ?90 and finally 

Cl = logio <^2 = logic ■ 

Note that we dropped the factor 5 usually in the defini¬ 
tion of C, so that all morphometric measurements used 
will fall approximately in the range [ 0 , 1 ], and thus statis¬ 
tic standardization would have little effect and may be 
optional. The concentration Ci is more sensitive to see¬ 
ing effect that is more pronounced in the central regions 
and thus on i? 2 o; C 2 is more sensitive to noise that is 
more important in the outer regions and thus on the 
measure of Rqq. 


r{I,h) 

( 1 ) 

s{I,h) 

( 2 ) 


4.2. Asymmetry Ai, A 2 , ^3 

The asymmetry coefficient A is determined comparing 
a source image with its rotated counte rpart. We mea¬ 
sure t he asymmetry Ai, as defined by [Abraham et al.| 
(1996), with the exception that we do not subtract the 
background asymmetry, for we find that this procedure 
makes the asymmetry estimation unstable and sensitive 
to the selected sky (an hence to the stamp) size. Instead, 
we consider only the galaxy portion inside the Petrosian 
region. Even so, Ai depends heavily on the noise and on 
the image sampling. To address this problems we used 
two new asymmetry measurements defined as 


and 


where r() and s( ) are the Pearson a nd Spearman correla¬ 
tion coefficients ( Press et al.||2002 ), respectively, I is the 
image and its W-rotated version. The rationale behind 
this formulation is that those pixels made up mostly of 
noise will not contribute to A 2 and A^ since the correla¬ 
tion between them will tend to zero. Further, correlation 
coefficients are more immune to convolution and thus less 
affected by seeing effects. The Pearson coefficient tends 
to accumulate close to unity and so A 2 has proven not 
so useful as Ai and A 3 . The center of rotation is chosen 
to minimize the asymmetry measurements. 

4.3. Gini coefficient 

The Gini coefficient G measures the flux distribution 
among the pixels of a galaxy image. The Gini coefficient 
for the image pixels in the Petrosian re gion is calculated 
exactly as shown in Lotz et al. (2004), i.e., for n pixels 
with values R in increasing order we have 


G = 


1 


n{n — 1 ) / 
where I is the average value. 


y^(2i - n - 1) /i 


4.4. Smoothness 

The smoothness coefficient S (a.k.a dumpiness) in gen¬ 
eral measures the small scale structures in the galaxy 
image. Here we consider three different measures of 
smoot hness. 5'i is calculated as shown in |Lotz et al.| 
(2004 ), except that t he filter used is a Hamming win¬ 
dow”! Hamming 1998) with size \Rp/A]. Following the 


same reasoning as for asymmetry, we define the modified 
smoothness S 2 and as 


S2 = l-r{I,I^) 

and 

S^ = l-s(I,I^) 


( 3 ) 

( 4 ) 


where is the filtered image. As with asymmetry, 
has proven to be more useful than 82 - 


4.5. Entropy 

The entropy of information H (Shannon entropy, e.g. 
Bishop (2007)) is used here to quantify the distribution 
of pixel values in the image. For a random variable I, 
the entropy H(I) is the expected value of the information 
log[p(/)] 


K 


^^(c = -E p(4) log[p(4)]. 


( 5 ) 


where p{Ik) is the probability of occurrence of the value 
Ik, k refers to a specific value and K is the number of 
bins considered. For discrete variables, H reaches the 
maximum value for a uniform distribution, whenp( 4 ) = 
1/K for all k and hence Hmax = log AT. The minimum 
entropy is that of a delta function, for which H = 0. We 
then have the normalized entropy 

Hil) = ® 0 < Hil) ^ 1 . ( 6 ) 

-^max 

Smooth galaxies will have low H while clumpy will have 
high H. 


4.6. Spirality 

None of the CASGM parameters take into account the 
spiral arms, rings and bars in galaxies, albeit they are a 
major and important emphasis of human based classifi¬ 
cation. We devise a parameter to take it into account. As 
done in Shamir (2011), we first transform the standard¬ 
ized galaxy image to polar coordinates (r, 6 ). In (r, 6 ) 
space a bulge appears as a band in the lower region of 
the diagram, a bar as two vertical lines and spiral arms 
as inclined bands. See Figure We then calculate the 
gradient magnitude |V/| and direction ip fields of the 
polar image. Most points in this direction field ip for an 
elliptical galaxy will point to the bottom, whilst for a spi¬ 
ral galaxy there will be many orientations corresponding 
to arms, rings and bars. The standard deviation for 
the field direction values will be smaller for an ellipti¬ 
cal compared to that of a spiral, and hence can be used 
to estimate the amount of characteristic structures. To 
avoid regions of noise, we make the measurements in re¬ 
gions where the gradient magnitude is greater than the 
median of the magnitude field. 

Figure [2] shows a density plot of versus T-type for 
the EFIGT database (a subsample of galaxies were se¬ 
lected such that there are 45 objects in each T-type), 
where a clear linear relationship is seen. So, cr^ is a 
good diagnostic for T-type, provided there is enough spa¬ 
tial resolution to distinguish spiral arms, rings and bars. 
This is the case for EFIGI database, marginally for NA 
and not for LEGAG Y in general, as inferred from the 
discussion in Section [O] and Figure]^ 
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Fig. 1.— Illustration about how spirality is measured. EFIGI spiral SB PGC 2182 above and lenticular PGC 2076 below. From left to 
right: original image, standardized image {q = 1, PA = 0°), image in polar coordinates, gradient field of polar image. 
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Fig. 2.— Density plot for versus T-type for the EFIGI sample. 
For this plot, a subsample of galaxies were selected so that there 
is 45 galaxies in each T-type. 



5. THE MORFOMETRYKA ALGORITHM 

We developed a standalone application to automati¬ 
cally perform all the structural and morphometric mea¬ 
surements over a galaxy image, called Morfometryka^] 
(mfmtk). MFMTK reads the input stamp image and re¬ 
lated PSF for a given galaxy and performs various mea¬ 
surements explained in detail in Appendix mfmtk 
is currentlWmplemented in an object-orientedTashion in 
Python 2.10 with the aid of scientific lib raries Sc i Py an d 
Numpy (Ohphant (2007)), Matplotlib ( ]Hunter (2007)) 
and PyEitFf 


^ http://morfometryka.ferrari.pro.br 

^ Python Software Foundation. Python Language Reference, 
version 2.7. Available at http://www.python.org 

^ PyFits is a product of the Space 'Telescope Science Institute, 
which is operated by AURA for NASA 


The Morfometryka basic output is: sky background 
value and standard deviation; image centers (a;o,yo)coL, 
2 / 0 )peak) Sersic parameters for ID surface brightness 
profile fitting Ruid; u-id) and for 2D image fitting 

iIn 2 D,Rn 2 D,n 2 D,q 2 D,PA 2 D,ixo,yo) 2 D), Petrosian Ra¬ 
dius Rp] radii R 20 , R 50 , Rso, Rqo and concentrations Ci 
and ( 72 ; asymmetries Ai, A 2 , A 3 and fitted center for Ai 
and A 3 ; smoothness S*! and S 3 ; Gini coefficient G; sec¬ 
ond moment M 20 ; gradient field direction value ip and 
standard deviation quality flags QF. Optionally, all 
maps (star masks, segmentation map, polar image and so 
on) are saved. Morfometryka takes about 12 seconds 
to process a 256 x 256 galaxy and 45 x 45 PSF image on 
a 2.5 Ghz processor. The version used in this work was 
5.0. 


6 . SUPERVISED CLASSIFIGATION 

Our morphological classification is based on the lin¬ 
ear discriminant method which separates galaxies in two 
main classes (E and S) in morphometric parameters 
space. We train t he classifier using the clas sification from 
the Galaxy Zoo (Lintott et al.||2C)08 2011). The process 
was done independently tor EEIGI, INA, LEGAGY and 
LEGAGY—zr datasets. 

Our main goal is not only to classify galaxies in a way 
that reproduces the human classification but also to es¬ 
tablish a basis for a morphometric space where galaxy 
classes are separated allowing further studies where the 
human classifier cannot be used. Thus, we use a lin¬ 
ear discriminant and also we seek for the smallest set of 
independent parameters that may yield a reliable classi¬ 
fication which is physically meaningful. 


6.1. Feature Selection 

Given that we have so many measured quantities for 
each galaxy, some of them may be redundant or irrele¬ 
vant and we need to select those which are more relevant 
to the classification algorithm. Many feature selection 
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PGC0009445 r model residual 
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(:ro,yo)coi =( 127 . 3 . 132 . 4 ) 
(^o.yo)nux = ( 124 . 5 . 127 . 5 ) 
(xo.yo)m = ( 125 . 4 . 126 . 5 ) 

= 0.61 = 76.57 

qn, = 0.65 FAo, = 75.64 
X, = 10.915 

X2.a=0.016 \2.6=0.056 x'2.r=0.a56 


/n|£)=25.25 i?ni^=48.90 n|/) = 1.28 
/n-2£)=38.35 /?n2£.=37.05 n-2£>=0.92 
= 69.57 psffyyif \i=3M 
Gi =0.41 Ci =0.28 

= 0.528 ylj = 0.637 A, = 8.127 
Si = 0.47 53 = 0.52 G= 0.74 
i^=l ,59 <T,. = 0.38 //= 0.62 
V5.0 



Fig. 3.— Example Morfometryka graphical output for EFIGI r-band image of PGC 9445 (ObjID-DR7 587731513150 734 392). Top, 
from left to right; PGC0009445_r: original image (dark green line is primary segmentation, red line is 2D Sersic fit dotted light green 
line is Petrosian region of 2 Rp, color caracters mark different centers); model: 2D Sersic model image (insert showing the PSF at bottom 
left); residual: image minus 2D Sersic model residual; Al map: asymmetry map used to compute Ay, SI map: Smoothness map used 
to Si. Bottom:, left to right: various measurements (see text for details); brightness profile (arbitrary units): black dots are measurements, 
red line ID and yellow line 2D Sersic fits, polar: map used to compute image gradient and tr^. Its morphometric index is M; = 0.16 


algorithms tends to diminish the importance of quan¬ 
tities that correlate with each other. A criterium that 
avoids that is the Maxim um Information Content (MIC, 
Albanese et al. (|2013 1). MIC is based on the mutual 
information and the information entropy: it compares, 
given the parameters and the known class, which one 
possesses the greater mutual information with the class 
variable, i.e. which one will have greater impact in the 
classification. The normalized values for MIC are shown 
in Figure 15 

We mayTiave more information on how efficiently each 
feature helps to separate the classes by examining the 
feature histogram separated by classes, as shown Ap¬ 


pendix |D} in Figure [lb| (EFIGI), Fig. [^(NA), Fig. 12 


(LEGAUY) and Fig. |13| (LEGACY—zr) . We must oe 
aware that we are seeing marginal probability distribu¬ 
tion functions (PDF) on each variable and this is not 
equivalent to analyse the multivariate PDF of all param¬ 
eters together. 

First, we note that the features with highest discrim¬ 
inant power are those related to the light concentration 
(Sersic m Ci and ( 72 ). Since they are equivalent (Ap¬ 
pendix O , and also equivalent to the Petrosian Radii 
(Appenmx [ 1 : we retain only Ci and C 2 which are not 
parametric and more robust. 

Comparing the asymmetry measures in the histograms 
in Appendix |D we see that A 3 is able to discriminate 
classes better than Ai, which is confirmed by the MIC 
values. Gini coefficient is very poor at separating E from 
S, as it is M 20 . Compared to Gini, entropy H works bet¬ 
ter in separating classes, and the introduced S 3 is better 


than the original Si. The axis ratio q is good for E but 
indifferent for S, so it is not used. The spirality a.^ is 
also good to discriminate classes but it is crucially de¬ 
pendent on angular resolution - its importance decreases 
from EFIGI to NA, to LEGAGY, in the same sense as 
the mean angular resolution decreases. 

Finally, based on the MIC analysis, we choose this set 
of parameters 

X = {Cl, A 3 , S 3 , H, (7^}, 

for they constitute a minimal set of independent param¬ 
eters that yield a reliable classification. Four of the cho¬ 
sen parameters are new, used here for the first time. A 3 
and S 3 are enhanced versions of standard parameters, H 
is first applied in morphometric studies and tr.,/, is com¬ 
pletely new. 

6.2. Linear Discriminant Analysis 

A simple linear classifier may be represented by a 
discriminant function which for a given input 
morphometric measurements 
gives 

/(x) = w^x-b Wo, (7) 

where w is the weight vector and wq is the threshold. The 
input vector is assigned to the class Ci if /(x) > 0 and 
to C 2 otherwise. The decision boundary or surface is a 
hyperplane defined by /(x) = 0 , for which w is a normal 
vector and —wo/||w|| its normal distance to the origin. 
The decision function corresponds to the perpendicular 
distance from x to the decision surface. 


X that contains d 
20M|||Bishoi^[2007) 


vector 

(Duda 
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Fig. 4. — Feature Relative Importance for the morphometric pa¬ 
rameters as calculated by the Maximum Information Content cri¬ 
terion. 



When using the Bayes Decision Theory the expressions 
for w and wq are assigned as follows: an object belongs 
to class Cl if 

P(Ci, x) > P(C 2 , x) (for class Cl) (8) 

and to C 2 otherwise. Since the evidence p(x) is the same 
for both classes, Bayes rule in Eq. (|^ is equivalent to 

p(x,Ci)P(Ci) > p(x,C 2 )P(C 2 ) (9) 



EFIGI 

NA 

LEGACY 

LEGACY-^r 

A 

0.938 

0.902 

0.877 

0.938 

P 

0.962 

0.931 

0.905 

0.956 

R 

0.964 

0.899 

0.935 

0.968 

Fi 

0.963 

0.914 

0.920 

0.963 


TABLE 2 

Mean scores for each database for a 10-fold cross 

VALIDATION TESTS. 


6.3. Classifier Performance 

We may estimate the classiher performance by means 
of a confusion matrix or contingency table, which is a 
comparison of the actual class with the predicted class for 
each objects. The performance is evaluated calculating 
several scores based on true positives (TP, hits), true 
negatives (TN, correct rejections), false positives (FP, 
false ala rms) and false ne gatives (FN, misses). See for 


example Hackeling (2014). 


The accuracy A is the fraction of hits relative to the 
total number of classifications 


A = 


TP + TN 


TP + TN + FP + FN’ 

Precision P is the fraction of positive predictions that 
are correct 

o . 

TP+ FP’ 


Sensitivity i? is the fraction of the truly positive instances 
that the classiher recognizes 


where p{x,Ci) is the class conditional probability density 
function (CCPDF) and PfiCf) the prior. We assume that 
the CCPDF is multivariate normal density 


p{x,Ci) = 


1 


(27r)P2|s|i/2 


exp 


\x-/xO 


( 10 ) 

where Hi are the mean and Si the covariance matrix of 
X for class G . The decision rule Eq. ([^ , or equivalently 
its logarithm, is then 

- +lnP(Ci) > 

-^(x-A‘2)'^S2^^(x-/X 2)+lnP(C2) (11) 

The terms involving x'S”^x' are general quadratic forms 
and if we expand them in Eq.(ll) we have a Quadratic 
classiher. But instead, we consider identical covariance 
matrices Si = S 2 = S w hich yield a Linear classiher. 
Expanding the terms in Eq.( 11), and ignoring those that 
are identical for both classes, we have 


S ^(/^i —/i,2) +-/xf S ^/Xi -I--/x^S ^/X2 


If we refer to Eqs|7|and[l^ we have then 

W=S"^(/Xi-/X2) 


( 12 ) 


wo = ^fiJi: + V2- 

which completes our linear classiher. 


In 


PjCi) 

P{C2y 


(13) 

(14) 


P TP . 

TP + FN' 

and Pi score which is harmonic mean between sensitivity 
and precision 

2TP 

1 2TP + FP + FN' 

We test the performance of the classiher by a 10-fold 
cross validation: for each database we selected those 
samples with known Galaxy Zoo classihcation and par¬ 
titioned them in 10 parts; in each of 10 runs one of the 
parts we used as a validation sample and the other 9 
parts as training sample. In each run, the scores A, P, 
R, FI were calculated; their hnal averages are shown in 
Table For all databases the classiher performs usually 
better than 90%, namely 90% of the time the automated 
classiher grees with the visual classihcation. If we con¬ 
sider that the performace in the human classihcation is 
also of that order, and that those classihcation were used 
to train the classiher, then this performance can be con¬ 
sidered very good and it is the best hgure that we could 
expect without using a classiher that would incorporate 
the errors in it. 


7. MORPHOMETRIC INDEX 


As stated in Section 6.2 the discriminant function /(x) 
is the distance of x to the plane that separates classes, 
here ellipticals and spirals. Based on that, we propose to 
use /(x) to represent the galaxy type, that we call the 
morphometric index M{. Figureshows the compar¬ 
ison of Mi with the T type from EFIGI and NA sam¬ 
ples. There is a clear linear relationship between M, and 
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T type justifyiM the use of Mj as a morphometric in¬ 
dex. In Section!^ we extend this argument by comparing 
Mi with other galaxy physical characteristics. By con¬ 
struction, Mi is negative for early-type and positive for 
late-type galaxies. 

A linear regression between T and M, could be used to 
calibrate Mi as an inferred T-type, but since T is a sub¬ 
jective parameter we prefer to maintain Mi in its own 
scale, and as a pure morphometric measure, estimated 
solely based on the values of x. For a binary classifica¬ 
tion, the magnitude of the direction vector w has no im¬ 
portance, and in fact, ||w|| could be different depending 
on the details of the LDA. But since we want to use this 
distance as a physical measure, we prefer to normalize 
w and the distance to the plane wq in the morphometric 
index M,, so that 

Mi = -I-ico, (15) 

, . w j - '*^0 

where w = t,— and wq = t,—, 7 . 

||w|l ||w|| 

The final values for LEGACY database are 


w = {-0.832,0.249,0.451,0.190, -0.079} (16) 

Wo = 0.018 (17) 


As we see in Eq.(14), wq depends on the class pri¬ 
ors P{Ci). Usually, tne relative frequency for each class 
Ni/N {Ni the number of objects in class Ci, N total num¬ 
ber of objects) is used as the prior, since it gives the prob¬ 
ability that a new object belongs to class Ci if we know 
nothing about it. But in the EFIGl and NA databases, 
the relative frequency is biased, as it was designed to 
contemplate each morphological T type with approxi¬ 
mately the same number of objects. So, the priors and 
hence wq for EFIGI and NA could not be applied to other 
databases with different relative frequencies. LEGACY 
and LEGACY-zr databases, on the contrary, have pri¬ 
ors that may reflect real distribution of classes of galax¬ 
ies, since no morphology was used to select the objects. 
Briefly, the intercept term in the linear relationships in 
Figure ([^ may be biased by the selection effects in the 
databases; the linear character however is unaffected. 

The linear regression between T-type and M,, shown 
in Fig. © is a least square solution using a robust Theil- 
Sen estimator, which computes the median slope among 
all pair s of points in a set, imp emented in Scikit-Learn 
library (Pedregosa et al. ||2011). Note that T ~ 20 Mi. 

In order to have IVli as a trustable morphological indi¬ 
cator we need to establish how accurate it is, which in 
principle can be done by propagating the error from each 
of the parameters Ci, A 3 , S' 3 , H and However, we 
find it more realistic to compute the signal-to-noise of 
the galaxy as S/N= /„^ 2 D/skybgstd (see Appendi x 
and see how Mi varies with it. As we can see from Fig- 
ure|^ there seems to be no trend between them and the 
blue area indicates that most galaxies have S/N around 
10 and average Mi slightly around 0 . 1 , which is slightly 
above 0 . 0 , where we would expect. 


ering visual classificat ion, although this is done remark¬ 
ably well (see Section 6.31. In this work, the parameters 
defining the morphology of a galaxy are physically moti¬ 
vated and to confirm how successful we were in reaching 
this goal we compare Mi with quantities measured from 
the spectrum of the galaxies we measured. 

Figure [^exhibits how Agen (age weighted by luminos¬ 
ity), AgeM (age weighted by mass), eGlass (a single pa¬ 
rameter classifier based on PGA analysis, retrieved from 
the SDSS database), and central velocity dispersion cr 
correlate with Mi. Notice that the SDSS spectra reflect 
properties of the central region of galaxies. In Figure [Tja 
we see that, overall, negative Mi corresponds to ol^r 
systems. Agen reflects more recent episodes of star for¬ 
mation and in this case as Mi goes to very negative the 
systems do not present any recent star formation, namely 
these are very old galaxies. There is a ridge of old systems 
extending from Mi = -0.5 up to 0.1 and then a signifi¬ 
cant drop in age as Mi tends to 0.2. For 0.1 < Mi < 0.4 
we see that Agen is around 1.5 Gyr. These are the late 
type spirals which exhibit a considerable amount of star 
formation. This figure clearly shows that morphology, 
in this case manifested by the parameter Mi, varies con¬ 
tinuously from old to young stellar population which is 
an important aspect of any morphological quantifier - to 
reflect stellar population properties of galaxies. 

Figure [^b is similar to Figure [Tja but plotting AgeM 
instead, ■raich contrary to Agen, reflects more the whole 


star formation history of a galaxy. The same trend is 
seen here, however since the SDSS spectra samples only 
the central region of the galaxies, there is a population 
that dominates the figure for AgeM around 10.0 Gyr, 
from early (negative Mi) to late types (positive Mi). 

Figure 0c exhibits how Mi is related to eGlass, a pa¬ 
rameter designed to express differences in stellar popu¬ 
lations among different galaxies and then serving as a 
discriminant between early and late type systems. We 
find that the relation between these two quantities is not 
linear, which is what we would expect if both reflected 
morphology in a one-to-one relationship. What we see is 
that for —0.5 < Mi < 0.2 eGlass is concentrated around - 
0.15 (early type systems), with a scatter that increases as 
Mi increases. Then for Mi > 0.2 eGlass increases steadily 
reaching eclass of 0.5 for 0.2 < Mi < 0.4. Both eGlass 
and Mi are associated to morphology, although eGlass 
is primarily associated with stellar population and Mi is 
derived solely based on image morphometry. Mi is more 
sensitive to morphology, particularly in the early-type 
systems domain (Mi < 0 ). 

Finally, in Figure 0d we present the relation with the 
central velocity dispersion a (corrected for an aperture 
of Re/8, where Re is the effective radius of the galaxy). 
Even though with a large scatter a clear relation exists 
between Mi and a, which is remarkable considering that 
Mi is solely photometric. In summary, these compar¬ 
isons show that Mi is reliable in separating the different 
morphological types according to their stellar population 
properties, a performance not seen in other previously 
proposed morphological quantifiers. 


8 . GOMPARISON WITH OTHER PHYSIGAL 
PARAMETERS 

We present in this paper a new approach for galaxy 
morphological classification that is not focused on recov- 


9. SUMMARY 

We present a new method to establish morphological 
classification of galaxies that is physically motivated al¬ 
though it matches what is done visually for the very 
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T type 


Fig. 5.— The relationship between the morphological T-type and the morphometrix index Mj for EFIGI (above) and NA (below) 
databases. Small solid dots are individual galaxies; large circles indicate mean Mi for each integer T-type and their size is proportional to 
number of object; error bars indicate standard deviation in M;. Contours are draw for a kernel representation of points. The dashed line 
are the best linear regression, whose parameters are shown on top of each plot. 


nearby Universe equally well. In the following, we sum¬ 
marize the main aspects of the classihcation system pro¬ 
posed here and the verification analysis: 

1 - We developed a pipeline that automatically es¬ 
timates morphometric parameters from galaxy images. 
Measured parameters include Concentration, Ci, Asym¬ 
metry, A 3 , and Smoothness, S 3 which were slightly mod¬ 
ified with respect to the conventional ones. We also make 
use of two new extra parameters: entropy H and spirality 

(T'lp. 

2 - Morfometryka measures several quantities per 
galaxy which brings the question of which ones are more 
adequate for establishing the morphological type of the 
system. We use a method called Maximum Information 
Content (MIC) to select the relevant features avoiding 
redundancy. The new introduced morphometric param¬ 
eters have a better discriminant power than previously 
used ones. MIC analysis resulted in the minimum num¬ 
ber of independent parameters listed in item 1. The re¬ 
lationship between concentration, Petrosian radius and 
Sersic index n is derived in Appendix [O and 

3 - Our supervised classification is based on Galaxy 


Zoo and tested with different datasets: EFIGI, NA, 
LEGACY and LEGACY-zr. The Linear Discriminant 
Analysis (LDA) method is used to determine the deci¬ 
sion surface that separates early from late type systems 
and the distance from this surface will indicate how early 
or late the system is. It is exactly this distance that we 
propose as a morphological index. Mi. 

4 - Glassification performance was evaluated using the 
confusion matrix, from which we measured accuracy, pre¬ 
cision and sensitivity scores, with a 10 -fold cross valida¬ 
tion scheme. We obtain final scores better than 90%. 

5 - Another independent validation comes from com¬ 
paring Mi with stellar population quantities and veloc¬ 
ity dispersion which were established using the spectra 
available in DR7 together with the spectral fitting code 
STARLIGHT. We note that Mi correlates with eGlass 
and it shows that classifying early-type galaxies solely 
as eClass < 0 can signihcantly contaminate the sample 
with late-type systems which have Mi > 0 . 2 . 

We thank the referee Ronald Buta for many sug¬ 
gestions that helped improve the manuscript. We 
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0.01 0.1 1 10 100 
S/N 

Fig. 6.— Comparison between the signal-to-noise of the galaxy 
defined as S/N= 7y2^2D/skybgstd and the Morphometric Index Mi. 

APPENDIX 

A. MORFOMETRYKA ALGORITHM DETAILS 

Here, we provide a detailed description of the various measurements in Morfometryka. mfmtk is logically divided 
into four main blocks (classes in programming parlance): Stamp - basic data reading and low level, low complexity 
geometrical measurements; Photometry - luminosity distribution, star masking and Petrosian radius estimation; 
Sersic - ID and 2D luminosity distribution fitting; Morphometry - measurements of the morphometric parameters 
used later on for establishing the galaxy’s morphology. The package also include auxiliary applications makemySDSS 
for retrieving SDSS frames and cutting stamps and LDAclassify to perform the Linear Discriminant Analysis. In 
the following, logical units are written in SMALL CAPS, algorithm code in typewriter. 

A.l. Cutting stamps 

The list of all objects, containing ObjIDs, RA, DEC, run, rerun, cauncol, field and petroRad, is generated 
with the following SQL query on SDSS CasJobs 



SELECT p.objID, p.ra, p.dec, p.run, p.rerun, p.camcol, p.field, p.petroRad_r 
FROM DRT.SpecObj as s JOIN DRT.PhotoObj AS p ON s.bestObjID = p.objID 
WHERE s.specclass = 2. 


From this list, we build a set of unique combinations of (run,rerun,Ccuncol,field) and the required SDSS Frames 
and psFields are downloaded. We do it exactly as for DR7 Frames and psFields but we download the DRIO hies 
since they refer to the same region of the sky, i.e. the raw data are the same, but th e image processing algorithms 
were improved from DR7 to DRIO. Also, DRIO frames are calibrated in nanomaggiefi] (|Lupton et al. 1999). For each 
object, the relative Frame is loaded and a square region of size 10 petroRad_r centered in the object's RA and DEC 
is cut. The PSF for the same position is generated with the SDSS read_PSF application from the psField hie. The 
stamp FITS hie header is updated with the astrometry and relevant frame keywords. If the object is in the frame 
border, i.e., if it has less than 90% of pixels in the frame, a FITS header keyword FLAGINC and a header comment 
”MFMTK: incomplete starnip” is written. 


A.2. Basic image processing 

The process starts with the target image galO and the associated PSF, which is measured from the second moment 
collapsed in the y direction.The sky background skybg is estimated from the median of all pixels from the four 
corners of the image (squares of typical width of 10 pixels, skyboxsz)H The accuracy of the sky background estimate 
skybgstd is set by the standard deviation of the aforementioned set m pixels. 

^ https://www.sdss3.ors;/drl0/ale;orithms/mae;nitudes.php , r i i ■ t 

c ^ '' , % , / 7 , enough even it a star occupies several pixels in one ot the corners. 

^ The median estimate is less afiected than the mean by out¬ 
liers, and such sky background estimate has proven to be accurate 
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Fig. 7.— Relationship between Mi and Agen (age weighted by luminosity), 
dispersion cr 
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The segmentation is done on the galOfltr image, which is the galO image median filtered with a window of 
size segS (typically 5 pixels); high frequencies are filtered from the image to avoid sharp edges in the segmented 
regions. Regions are then selected by histogram thresholding: those pixels whose intensity are greater than the 
threshold median(gal0fltr)+segK- mad(galOfItr) are selected, where mad is the median absolute deviation. This 
threshold selects segK mad above the median, which is similar to sigma-clipping K standard deviations above the 
mean, except that median and the median absolute deviation (mad) are used, which are more robust to outliers and 
intensity variations. This histogram thresholding operates on the intensity space only and may select regions that 
are not spatially connected. The spatial information is taken into account by performing a connected-component 
labeling, where 4-connected pixels receive the same label. At this stage the segmentation consists of one or more 
labelled regions. The final segmented region is then selected either by size (largest) or by position (center of light 
closest to image center), depending on configuration. For SDSS stamps the position criteria is used. A segmentation 
mask segmask is made from the selected region, from which a segmented galaxy image galOseg is derived; on both, 
pixels outside the segmentation region are nil. Geometric measurements in this section are done in the galOseg image. 

The galaxy image center is estimated in two distinct ways. First, the peak center (xq, 2/o)peak, referring to the 
locus where the intensity is maximum, is estimated from the center of light (first moment) of the 5x5 matrix around 
the pixel with highest intensity, attainning sub-pixel precision. 

For an image I{x,y) the standard image moments are defined as 

OO OO 

TTlpq — 



— OO —OO 


xPy‘^I{x,y) dxdy, 
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and the center of light (CoL) (xq, yo)coL are given by xq = mio/moo and yo = moi/moo- The translational invariant 
moments /ip^ are determined by replacing x and ?/ by (a; — a;o) and {y — yO), respectively, in rupg. The axis lengths are 
given by 

, , -\/|M20 + Mo2 — A| 

Al — - A2 — - 

moo Woo 

where A = y(Ai20^-M02)^^l-4/J^ 

from which we define 


(Al) 

(A2) 


a = max (Ai, A 2 ) 
b = min (Ai, A 2 ). 


Further, we can calculate the position angle of the main axis by 

PA = i arctan(2/iii,/r2o - ^ 02 )- 


Details of the derivation of the relations above are given, for example, in ( Flusser fc Zitov^|2009 ). 

For future use, a standardized version of the segmented galaxy image is calculated: given the parameters estimated 
above, an affine transform is applied to galOseg such that in the resulting stangal image the object is centered in 
the image array, has zero position angle and its axial ratio is unity. Optionally the integrated luminosity can be 
normalized. 


A.3. Photometry Routines 

Photometry is performed by positioning successive ellipses relative to a fixed center (xq, j/o)peak, with constant axis 
ratio b/a and position angle PA. The photometric measurements are performed for ellipses with semi-major axis R 
ranging from 1 pixel up to the size of the image diagonal, in steps of 1 pixel. Later the profiles are truncated (see 
below). 

The pixels 1 pixel away from the ellipse are called border pixels (ellindxbrdr) and the pixels inside the ellipse are 
internal pixels (ellindxin). In a given semi-major axis R, for the set of border pixels, those pixels whose intensity 
are above some threshold relative to the pixel group (given by median(ellindxbrdr)-|-StarSigma-mad(ellirLdxbrdr)) 
are masked as stars. The ellipse mean intensity I{R) and associated error /err(f?) are calculated by the average and 
the standard deviation of the border pixels not masked as stars. The total luminosity L{R) is the sum of the internal 
pixels for each ellipse, and the mean intensity (/)(i?) is given by the ratio of L{R) and the number of pixels inside the 
ellipse. 


At each semi-major axis iteration, the Petrosian function, Eq.(A6l, is evaluated and once it falls below the critical 
value 770 = 5, the Petrosian radius Rp is evaluated by linear interpolating r]{R) between the adjacent points. A 
Petrosian Region is defined as an elliptic region of semi-major axis • Rp (we use = 2) and the same axis-ratio 


and position angle measure in the segmented image (Section A.2), and stored as petromask. The image galpetro = 


petromask • galO is defined. The profiles I{R), /em T(i?) arid (t)(i?) are cut at A^^ • R 

A.4. The Sersic routines 

Standard ID Sersic parameters are measured by fitting the Sersic law ( Sersic p967 ) 

V. 


.p. 


I (R) = In exp 


' f- 


- 1 


with 


bn = 2n- i. 


(A3) 


to the ID surface brightness profile I{R). The minimizations are done with a Levenberg-Marquardt algorithm, in a 
least squared sense. The fits are bounded by adding a square penalty function for parameters outside the specified 
range. The boundaries are: min[J(i?)] < In, id < max[/(i?)], 1 < Rn,iD < max[i?], i < uid < 50. The output 
parameters are In,iD, Rn.io, n-ip . 

where 


The 2D fitting applies Eq. (A3) convolved with the PSF and with R replaced by i? = 


2ro 


x' = {x — x^) cos(PA^) — {y — y^) sin(PA^) (A4) 

y' = -{x - xf^) sin(PA^) -{y-yf) cos(PA^). (A5) 


Coordinates x, y refers positions in the galaxy image. The two-dimensional Sersic function is fitted directly to the galaxy 
image, except that pixels outside the galaxy, as defined by the Petrosian Region, flagged stars and central circular 
region of 1 PSF FWHM are masked. The algorithm is the same as for ID fitting, with the following boundaries: the 
center (xq, 7 / 0 ) 21 ) cannot vary more than 15% compared to (a^oj 2/o)peak; In,2D must be within image pixel values range; 
Rn, 2 D cannot be greater than the image half-diagonal; ^ < n 2 D < 20, < ( 72 D =hla < 1. This setup has proven 

in simulation and in real galaxies to be the most stable, converging for most galaxies in the samples. The fit free 
parameters are {xq, yo, PA, q, Rn, n\ 2 D- 
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Flag 

NAME 

DEC 

HEX 

CRITERIA 

QFO 

normal 

0 

0x00 

no unusual situation 

QFl 

targetsize 

1 

0x01 

psf_fwh.in > Rp 

QF2 

targetisstar 

2 

0x02 

^n, 2 D ^ psf_fwhin and n 2 D ^ > 0-8 

QF4 

fitlDerror 

4 

0x04 

ID fitting routine did not converge 

QF8 

fit2Derror 

8 

0x08 

2D fitting routine did not converge 

QF16 

crowdedl 

16 

0x10 

more than 5% of Petro region masked as stars 

QF32 

crowded2 

32 

0x20 

Rn,2D > 

QF64 

crowdedS 

64 

0x40 

A4 > 10 


‘‘too many objects make this 


TABLE 3 

Morfometryka quality flags used that mark unusual situations 

Based on simulations of synthetic Sersic galaxies, we found that the 2D fitting is better than ID at recovering ’’true” 
parameters from images. Since the 2D is more unstable to initial parameters, we use the ID results as the initial guess 
for the 2D ht. 


A.5. Quality Flags 

For reference, a series of conditions are evaluated and informative Quality Flags (QF) are saved. They are not 
conclusive but may indicate situations when the condition occur. For example, if for a given object the Rn, 2 D is of the 
order of the PSF, n 2 D ~ 0.5 (a Gaussian) and h/a ~ 1, the object is probably a star, a Morfometryka target selection 
error. Other QF indicate that the fitting routine did not converge in situations of a crowded held. For detecting 
crowded helds, we dehne the asymmetry A 4 as the distance between rcoL = (a:o,yo)coL and rpeak = J/o)peak in- 
units of Rp, in percentage, 

A, = lOO ^CoL- rpeak ^ 


which attains values greater than ^ 10 i n crowded helds, and is used to turn the QF64 on. 
The QF are summarized in Table A.5 


A. 6 . Petrosian Quantities 

Petrosian (1976) dehned a function ri{R) which is the ratio of the mean intensity inside R to the intensity at the 
isophote R 


r]{R) 


HR) ■ 


(A 6 ) 


The Petrosian radius is the distance from the galaxy center where the fraction in Eq. (A 6 ) has some constant value 


il{Rp) = Vo- 


Here we use ryo = 5. The virtue of rj is that both the numerator and denominator have the same dependence with the 
distance, hence t] is distance independent. The Petrosian radius is used as a implicit scale length for each galaxy. 


B. PETROSIAN RADIUS AND SERSIC INDEX EQUIVALENCE 
The mean intensity within radius R for a Sersic model is the integrated lumin osity up to R divided by the region 


area (I) = L (R)/A, A = ttR^, so for x = 6 „(i?/i?„)^/", we have (see for example |Ciotti fc Bertin| ( |l999[ ); [Graham fc| 
Driver| ( 2005| )) 

b 7 ( 2 n,a:) 


(/)(i?) = 2 n/„e'’ 




We then have for the Petrosian function Eq. |A 6 | 


v{R) = 


2 n 7 ( 2 n, x) 


We have to solve 


2n7(2n, Xp 


= ? 7 o with Xp = x{Rp), 


(Bl) 


(B2) 


(B3) 


to obtain Rp as a function of n. This equation is transcendental and can only be solved for Rp numerically. However, 
for practical purposes, we can write an empirical Petrosian Radius function 


Rp{n) = Rn -—— exp 


71 — no 


a 


(B4) 
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Fig. 8.— Petrosian radius as a function of n. Top: numerical (dashed) and empirical function (continuous). Bottom: fractional error 
between the two. 

whose parameters i?max = 5.8, no = —1.11, a = 2.04 and a = 0.8 provide a fit better than 1% over the range 
0.3 < n < 15, as shown in Figure 

C. CONCENTRATION AND SERSIC INDEX EQUIVALENCE 
In the case of Sersic law, the integrated luminosity within radius R ( Ciotti fc Bertin||l999 1 is 

gfc 

L{R) = 2TrnInRl ^ 7(2n,a;) 


with X = hn{R/RnY^'^■ Hence the total Luminosity Lt = L{R —J. oo) is 

pb 

Lt = 2TrnInRl r(2n). 


From Eqs. Cl and C2 we have the equation for the Rf which attains some fraction / of the total luminosity 

7 (2n, x/) = / r(2n) with Xf=x{R = Rf) 

or for both Rfi and Rf 2 

-i{2n,Xfi) ^ fi 

'y(2n,Xf2) f2 

Equation (|C4|) cannot be solved analytically (except for n = 1/2) and the solution must be found numerically. Figure]^ 
shows the numerical solution for 1/2 < n < 15. 

Again, we can write an empirical function 


(Cl) 

(C2) 

(C3) 

(C4) 




(C5) 


which approximates the solution in the specified range with an error smaller than 2% for Ci in the range 1 < n < 15, 
with C" = 2.91, n' = 32.44 and P = 0.48 


D. HISTOGRAM OF MORPHOMETRIC PARAMETERS FOR DATABASES 
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Fig. 9.— Concentration as a function of n. Top: numerical (dotted for Ci, dashed for C 2 ) and empirical function (continuous). Bottom: 
fractional error between the two. 



Fig. 10.— Distribution of feature values among morphometric classes for the EFIGI database. 









Fig. 11.— Distribution of feature values among morphometric classes for the NA database. 
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Fig. 12.— Distribution of feature values among morphometric classes for the LEGACY complete sample. 



Fig. 13.— Distribution of feature values among morphometric classes for the LEGACY— 2 :r sample. 
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